
STOP 



Early Journal Content on JSTOR, Free to Anyone in the World 

This article is one of nearly 500,000 scholarly works digitized and made freely available to everyone in 
the world by JSTOR. 

Known as the Early Journal Content, this set of works include research articles, news, letters, and other 
writings published in more than 200 of the oldest leading academic journals. The works date from the 
mid-seventeenth to the early twentieth centuries. 

We encourage people to read and share the Early Journal Content openly and to tell others that this 
resource exists. People may post this content online or redistribute in any way for non-commercial 
purposes. 

Read more about Early Journal Content at http://about.jstor.org/participate-jstor/individuals/early- 
journal-content . 



JSTOR is a digital library of academic journals, books, and primary source objects. JSTOR helps people 
discover, use, and build upon a wide range of content through a powerful research and teaching 
platform, and preserves this content for future generations. JSTOR is part of ITHAKA, a not-for-profit 
organization that also includes Ithaka S+R and Portico. For more information about JSTOR, please 
contact support@jstor.org. 



American. Statistical Association. 



[106 



THE COEFFICIENT OF CORRELATION.* 

By William Gardner Reed, U. S. Weather Bureau. 



In many studies it is necessary or at least desirable to test 
the existence of concomitant variation between two series of 
variable quantities. A comparison of the plotted variables 
furnishes a rough, but for some purposes adequate, means of 
examining the relationship. Figure 1 is an example of this 



RAIN- 
FALL 


RELATION BETWEEN THE JULY RAINFALL AND THE YIELD OFCORN, 1888-1915 


YIELD 


§ i i i II 




+ 3.0 
+ 2.5 
+ 2.0 
+ 1.5 
+ 1.0 
+ 0.5 

-0.5 
-1.0 
-1.5 
-2.0 
-2.5 
-3.0 




















































+12 
+10 
+ 8 
+ 6 
+ 4 
+ 2 


- 2 
-If 

- 6 

- 8 
-10 
-12 




















































































































































































i 






-- 


| y 




















3 














/ 


\\ 






I 


i V. 


















/ 


\ 

\ 








I 




i 


\ 


SO 


RM 


A 


r- 






4 


A 


4a 


// 


t 




/ ' 


v\ 




/" 
• 


Cl 




\\ 




\ 






\ 


it 








3 








**S 


\ 


4- 










\ 


/ 




V 




\ 


7 








\ 






^ 




J 






1 

i 




\ 


t 
















\ 


1 


\ 


/ 








\ 


{ 






\ 


1 










\ 


t 






































\ 


t 


































\ 


1 














\ 




































\ 






























f 






















' 




























































The solid line {— jindicates the dcpar 
the month of July .over the followin|-nam 
Illinois. Iowa, Nebraska, Kansas.Missou 
The broken linel---)show$the departur 
bushels per acre for the same area, and 


ture of the average rainfall from the normal for 
d States.for the 2o years indicated; Ohio. Indiana 

•i.and Kentucky. 

e of the average yield of corn from the normal, in 
period. 





US.Weo.ther Bur. Notional weather and Crop Bui,Ser 1916N0 i*,June 20, 19 16 

sort of comparison. However, the use of curves is not to be 
recommended for careful work because of the difficulty in 
selecting the proper scales and the dangers resulting from per- 
sonal bias. The usual tabular method is slightly more refined, 
but tables involve too many figures to give an adequate idea 
of the conditions and give no concise measure of the degree of 
relationship. 

* For instructions for obtaining the coefficient of correlation see Persons, W. M.: The Correlation of 
Economic Statistics, Am. Stat. Asso., Quart. Publ., Vol. 12, pp. 287-322, Boston, 1910. 



107] The Coefficient of Correlation. 671 

The English biometricians have perfected a method of 
stating the degree of relationship, which was invented by 
Bravais about 1845. "Correlation may be briefly defined as 
the tendency towards concQmitant variation and the so-called 
correlation coefficient is simply a measure of such tendency, 
more or less adequate according to the circumstances of the case."* 
The early statements of the use of the coefficient of correla- 
tion indicate clearly that the attempt to obtain- such a coef- 
ficient from miscellaneous material is an abuse of this method 
of measuring relationship.! The material in hand should be 
investigated carefully before any attempt is made to deter- 
mine the relationship by the use of the coefficient of correla- 
tion. This investigation may take the form of a correlation, 
table or of a "dot chart" after Galton's graphic method of 
correlation.f 

METHOD OF PROCEDURE. 

If the coefficient of correlation is to have any definite 
meaning, the procedure must be somewhat as follows : 

1. The material (e. g. Table I) should be arranged in groups 
in the form of a correlation table (Table II), or, better, plotted 
as a dot chart (Figure 2). The table or chart should then be 
carefully examined to see whether the points may be general- 
ized to a straight line, that is, whether there is a tendency 
for a high value of one variable to be associated with high 
values of the other variable and proportionately higher or 
lower values of the one to be associated with similar values of 
the other. This shows positive linear correlation. When lower 
values of the one are associated with higher values of the other, 
the correlation is said to be negative. For example, the dots 
in Figure 2 may be generalized to the fine AB as well as to 
any curve. 

* Brown, W.: The Essentials of Mental Measurement, Cambridge, University Press, 1911, p. 42. 
(Italics are the present writer's.) 

t Yule, G. U.: Introduction to the Theory of Statistics, ed. 2, London, Griffin & Co., 1912, pp. 169, 
177. 

t See Davenport, C. B.: Statistical Methods, ed. 3, New York, Wiley, 1914, pp. 42-47. 



672 American Statistical Association. [108 

TABLE I. 

CORRELATION OF JULY RAINFALL AND THE YIELD OF CORN IN OHIO. 
(Smith, J. W.: The Effect of Weather upon the Yield of Corn in Ohio. Washington, 
Mo., Weather Rev., Vol. 42, 1914, p. 80. 



July Rainfall. 


Yield of Corn. 


Year. 


Amount. 


X. 


&. 


Bushels 
per Acre. 


V- 


v>- 


xy. 


1854 


- 2.6 
5.8 
2.6 
4.9 
4.7 
1.6 
5.8 
3.3 
3.6 
2.6 
2.1 
5.7 
5.1 
3.2 
2.7 
4.8 
4.7 
3.7 
6.7 
6.2 
3.8 
6.9 
6.4 
3.7 
5.4 
4.2 
4.2 
3.6 
3.2 
4.2 
3.8 
3.2 
2.9 
2.2 
4.4 
4.2 
2.0 
3.8 
3.8 
2.5 
1.6 
2.0 
8.1 
4.6 
4.0 
4.2 
4.6 
2.7 
4.7 
3.7 
4.1 
3.9 
5.1 
5.4 
4.1 . 
3.8 
3.2 
2.4 
5.7 
5.2 


- 1.4 
+ 1.8 

- 1.4 
+ -9 
+ .7 

- 2.4 
+ 1.8 

- .7 

- .4 

- 1.4 

- 1.9 
+ 1.7 
+ 1-1 

- .8 

- 1.3 
+ .8 
+ .7 

- .3 
+ 2.7 
+ 2.2 

- .2 
+ 2.9 
+ 2.4 

- .3 
+ 1.4 
+ .2 
+ .2 

- .4 

- .8 
+ -2 

- .2 

- .8 

- 1.1 

- 1.8 
+ .4 
+ -2 

- 2.0 

- .2 

- .2 

- 1.5 

- 2.4 

- 2.0 
+ 4.1 
+ .6 


+ -2 
+ -6 

- 1.3 
+ .7 

- .3 
+ .1 

- .1 
+ 1.1 
+ 1-4 
+ .1 

- .2 

- .8 

- 1.6 
+ 1.7 
+ 1-2 


1.96 

3.24 

1.96 

.81 

.49 

5.76 

3.24 

.49 

.16 

1.96 

3.61 

2.89 

1.21 

.64 

1.69 

.64 

.49 

.09 

7.29 

4.84 

.04 

8.41 

5.76 

.09 

1.96 

.04 

.04 

.16 

.64 

.04 

.04 

.64 

1.21 

3.24 

.16 

.04 

4.00 

.04 

.04 

2.25 

5.76 

4.00 

16.81 

.36 

"64 

.36 

1.69 

.49 

.09 

.01 

.01 

1.21 

1.96 

.01 

.04 

.64 

2.56 

2.89 

1.44 


26.0 
39.7 
27.7 
36.6 
27.7 
29.5 
38.2 
33.5 
30.0 
27.0 
27.0 
35.0 
36.5 
29.8 
34.4 
28.4 
37.5 
36.7 
40.9 
35.1 
39.2 
34.2 
36.9 
32.5 
37.8 
34.3 
38.9 
31.0 
34.0 
24.2 
33.3 
36.8 
33.5 
30.5 
38.9 
32.3 
24.6 
35.6 
33.3 
29.1 
32.6 
33.7 
41.7 
34.3 
37.4 
38.1 
42.6 
30.0 
38.8 
31.5 
32.8 
37.9 
42.2 
34.8 
36.1 
38.7 
36.6 
38.6 
42.8 
37.8 


- 9 
+ 5 

- 7 
+ 2 

- 7 

- 5 
+ 3 

- 1 

- 5 

- 8 

- 8 


+ 2 

- 5 

- 1 

- 7 
+ 3 
+ 2 
+ 6 


+ 4 

- 1 
+ 2 

- 2 
+ 3 

- 1 
+ 4 

- 4 

- 1 

- 11 

- 2 
+ 2 

- 1 

- 4 
+ 4 

- 3 

- 10 
+ 1 

- 2 

- 6 

- 2 

- 1 
+ 7 

- 1 
+ 2 
+ 3 
+ 8 

- 5 
+ 4 

- 3 

- 2 
+ 3 
+ 7 


+ 1 
+ 4 
+ 2 
+ 4 
+ 8 
+ 3 


81 

25 

49 

4 

49 

25 

9 

1 

25 

64 

64 

"i 

25 
1 

49 
9 
4 

36 

"i6 
1 
4 
4 
9 
1 

16 
16 
1 
121 
4 
4 
1 

16 

16 

9 

100 

1 

4 

36 

4 

1 

49 

1 

4 

9 

64 

25 

16 

9 

4 

9 

49 

.... 

16 
4 
16 
64 

9 


+ 12.6 
+ 9.0 
+ 9.8 
+ 1.8 
— 4 9 


1855 


1856 


1857 


1858 


1859 


+ 12.0 
+ 5.4 
+ .7 
+ 2.0 
+ 11.2 
+ 15.2 


1860 


1861 


1862 


1863 


1864 


1865 


1866 


+ 2.2 

+ 4.0 

+ 1.3 

5 6 


1867 


1868 


1869 


1870 


+ 2.1 
— .6 


1871 


1872 


+ 16.2 


1873 


1874 


— 8 


1875 


— 2 9 


1876 


+ 4.8 

+ .6 

+ 4.2 

2 


1877 


1878 


1879 


1880 


+ .8 
— 1 6 


1881 


1882 


+ .8 
— 2 2 


1883 


1884 


+ .4 
1 6 


1885 


1886 


+ 11 
+ 7.2 
+ 1.6 
— 6 


1887 


1888 


1889 


1890 


+ 20.0 
— .2 


1891 


1892 


• + .4 
+ 9.0 
+ 4.8 
+ 2.0 
+ 28.7 
— .6 


1893 


1894 


1895 


1896 


1897 


1898 




1899 


+ .6 
+ 4.8 


1900 


1901 


+ 6.5 
+ 2.8 
+ .9 


1902 


1903 


1904 


— .2 


1905 


— .3 


1906 


+ 7.7 


1907 




1908 


+ -1 


1909 


— .8 


1910 


— 1.6 


1911 


— 6.4 


1912 


+ 13.6 


1913 


+ 3.6 








-30.2 
+34.1 


112.67 


-125 
+ 99 


1258 


+201.4 




+ 3.9 


- 26 





1C9] 



The Coefficient of Correlation. 



673 





CORRELATION BETWEEN JULY PRECIPITATION 
AND YIELD OF CORN IN OHIO 




so 




















to 








• 


• 


• 




• 


V) 






• 


• 
• 

• 

$ • 

• 


• • • 

• 

• 


• 
• 


• 






-I 

ul 

X 

to 




• 


• 

• 

• 


• 
• 
• 


• • 

• 
• 




• 






m 


p* 


• 


• 


" 












z 






• 
• • 




•• 










z 






• 














IX 

o 

o 

u. 

o 


5(1 








• 










Q 

UJ 

>- 






















10 




















ft, 


1 


g 


3 


+ 


* ■ 


fc. 


7. 


n 



JULY PRECIPITATION IN INCHES 

AB»Lfh|£ OF RELATION 



674 American Statistical Association. [110 

M' a; =4.0 inches M' U = B5 bu. 

M s =4.0+ — =4.1 M a = 35- — =34.6 
60 * 60 

2a: = +3.9 Sj/=-26 

23?= 112.67 22/ 2 =125_8 

n \ n ' n \ n ' 



= ^-.0036 =V/' 

v 60 v 60 

= 1.4 =4.6 

Sa;?/ = 201.4 

Zsy/ZsX/Zy 

w \ n A n 



1258 >2 



r=- 



'x"y 



201.4 _ 3.9 x -26 
60 60 60 
1.4X4.6 
_ 3.36+.03 

6.44 
= 0.526 ±E r 

1-r 2 



E r = ±.674 
= ±.674 



Vn 
.723 



7.7 
= ±.063 
r = +0.526 ±.063 

Note: r is not the same here as in the original paper 
because a single average yield of corn has been used for sim- 
plicity. 

EXPLANATION OF SYMBOLS. 

n number of observations (years of record) 

M x true mean July precipitation 

M'j. some arbitrary number near M x 

Mj, true mean yield of corn 

M'„ some arbitrary number near M y 

x departure of each July precipitation from M' x 
y departure of yield of corn in each year from M'„ 



Ill] 



The Coefficient of Correlation. 



675 



2x Algebraic sum of departures of July precipitation 
"Ly Algebraic sum of departures of yield of corn 
Sx 2 algebraic sum of squares of departures of July precipi- 
tation 
"Zy 2 algebraic sum of squares of departures of yield of corn 
~Zxy algebraic sum of products of departures (x and y) 
<r x standard deviation of July precipitation 



, /sa: 2 CZx\ 2 
<r y standard deviation of yield of corn 

J~2£_Czy} 2 

cr y = \ n \n 



r coefficient of correlation 
2xy 



r = - 



-mm 



E r probable error of the coefficient of correlation 

1-r 2 



E r =±.674 



Vn 



TABLE II. 

CORRELATION TABLE SHOWING THE RELATION BETWEEN JULY PRECIPITATION 

AND THE YIELD OF CORN IN OHIO. 

(From Smith, J. W.: The Effect of Weather on the Yield of Corn, Washington, Mo., Weather Rev., 

Vol. 42, 1914, pp. 78-93.) 

Yield op Corn in Bushels pes Acee. 



July 

Precipitation 

in Inches. 


20.0 

to 

24.9 


25.0 

to 

29.9 


30.0 

to 
34.9 


35.0 

to 

39.9 


40.0 

to 
44.9 


80-89 


1 


2 
1 
5 

1 


1 
1 
4 
8 
5 
1 


1 
7 
8 
7 
1 


1 


70-79 


60-69 


1 
2 

1 


50-59 


40-49 


30-39 


20-29 


1 


10-19 







676 



American Statistical Association. 



[112 



2. If it appears from this examination that a straight line 
is as good a fit as any other type of curve not too complicated 
to be useful as a measure of relationship, the data may be 

CORRELATION BETWEEN JULY PRECIPITATION 
AND YIELD OF CORN IN OHIO 



n 

m 

II 

t 

2 
3 



*3Cy 




> 










\za-f 








• my 

m / 




4 

m 


*IOy 






• « 

• 








o 




• 


•/ 


^s^ • 










jf — / • 


• 
• 
• 


• 


• 




-Iffy 


*^ 


/ • 
y» * 




•• 






-SOy 





/ * * 


• 








-30y 




5 


























.■rff! 


-2<t> -\e* 


ft 


iifi£_ 


tie.. 


♦3<r. 



UNITCBr=l.4IN. 



A8LINE OF RELATION 

CO LINE OF RELATION FOB PERFECT CORRELATION 

'(COEFFICIENT OF CORRELATIONJoTAN^X'OB' 



replotted on a new dot chart for which the unit of measure- 
ment on one axis is the standard deviation of one of the varia- 
bles, and the unit on the other axis is the standard deviation 
of the other variable (see Figure 3). 



113] The Coefficient of Correlation. 677 

3. The position of the straight line which most nearly 
satisfies the data on the second dot chart may be determined 
rigidly by the method of least squares. When the standard 
deviation of one variable is used as the unit of the ordinates 
and the standard deviation of the other variable as the unit 
of the abscissae, the angles between this straight line of closest 
fit and the axes are significant. If these angles are equal, 
i. e. each 45°, the relationship between the variables is perfect 
(see C-D in Figure 3). If the line coincides with one axis or 
the other no .relationship is shown, although the converse is 
not necessarily true.* Positions between these two show 
partial relationship (see A'B' in Figure 3). 

4. The coefficient of correlation is merely a statement of 
the position of the straight line of closest fit on a chart where 
the units are the standard deviations of the variables as this 
position is determined by the least square adjustment, f The 
coefficient of correlation is expressed as the tangent of the 
angle made by the line of closest fit and the axis to which it is 
more nearly parallel (e. g. angle X'OB' in Figure 3 is 27J°, 
tan X'OB' = +0.526). In actual practice the coefficient of 
correlation may be determined mathematically from the data 
as shown in Table I without plotting the material on a dot 
chart, like Figure 3. However, the coefficient should never 
be attempted without first investigating the relationship far 
enough to see if it follows a straight line. That is, steps 2 
and 3 may be omitted in practice; step 1 should never be 
omitted. 

5. If the examination of the correlation table or dot chart 
shows that the relation is not that of a simple straight line, 
the coefficient of correlation is not a measure of the relation- 
ship between the variables. 

LIMITATIONS OF THE COEFFICIENT OF CORRELATION. 

It is clear even from a superficial study of the question that 
the coefficient of correlation obtained from material where a 
straight line relationship does not obtain may be too small, 

* Yule, G. U.: Introduction to the Theory of Statistics, ed. 2, London, Griffin & Co., 1912, pp. 174-175. 
t Yule, G. U.: Introduction to the Theory of Statistics, ed. 2, London, Griffin & Co., 1912, p. 172. 



678 



American Statistical Association. 



[114 




but will never be too large.* 
A coefficient of correlation may 
be near zero when there is very 
close relationship, as is shown 
in such a condition as the re- 
lationship between the height 
of high water and the phase 
of the moon which is shown 
for Old Point Comfort, Va., 
by Table III and Figure 4. 
The figure indicates that the 
relation is harmonic; although 
there is a close and very defi- 
nite relation between the 
phenomena, the coefficient of 
correlation is near zero 
(-0.106±.088) because the 
different portions of the curve 
of regression are in such rela- 
tions to each other that a 
straight iine along an axis will 
most nearly satisfy all the 
points. Of course the angle is 
then zero and its tangent is 
zero. 

* See Yule, G. U.: Introduction to the Theory 
of Statistics, ed. 2, London, Griffin & Co., 1912, 
p. 175, and Brown, W.: The Essentials of Mental 
Measurement, Cambridge, University Press, 1911, 
p. 27-59. 



115] 



The Coefficient of Correlation. 



679 



TABLE III. 

CORRELATION OF TIME AFTER NEW MOON AND PREDICTED HEIGHT OF THE 

HIGHER HIGH WATER AT OLD POINT COMFORT, VA. 

(U. S. Coast and Geodetic Survey, General tide tables for the year, 1916, p. 103.) 



Days after 

New Moon 

July 29, 1916. 


z. 


*•. 


Height 

above 

M. L. W. 


V- 


|(S. 


xy. 





-30 
-29 
-28 
-27 
-26 
-25 
-24 
-23 
-22 
-21 
-20 
-19 
-18 
-17 
-16 
-15 
-14 
-13 
-12 
-11 
-10 

- 9 

- 8 

- 7 

- 6 

- 5 

- 4 

- 3 

- 2 

- 1 



900 

841 

784 

729 

676 

625 

576 

529 

484 

441 

400 

361 

324 

289 

256 

225 

196 

169 

144 

121 

100 

81 

64 

49 

36 

25 

16 

9 

4 

1 



1 

4 

9 

16 

25 

36 

49 

64 

81 

100 

121 

144 

169 

196 

225 

256 

289 

324 

361 

400 

441 

484 

529 

576 

625 

676 

729 

784 

841 

900 


2.7 
2.6 
2.6 
2.5 
2.4 
2.4 
2.5 
2.5 
2.5 
2.6 
2.7 
2.8 
2.9 
3,0 
3.1 
3.1 
3.0 
2.9 
2.9 
2.9 
2.7 
2.6 
2.5 
2.4 
2.4 
2.4 
2.5 
2.5 
2.6 
2.6 
2.6 
2.5 
2.6 
2.6 
2.7 
2.7 
2.6 
2.6 
2.6 
2.6 
2.7 
2.8 
2.9 
2.9 
2.9 
3.1 
3.1 
3.0 
2.9 
2.7 
2.5 
2.4 
2.3 
2.2 
2.3 
2.3 
2.4 
2.4 
2.5 
2.6 
2.8 


+ .1 




- .1 

- .2 

- .2 

- .1 

- .1 

- .1 


+ .1 
+ .2 
+ 3 
+ .4 
+ .3 
+ .5 
+ .4 
+ .3 
+ -3 
+ .3 
+ .1 


- .1 

- .2 

- .2 

- .2 

- .1 

- .1 




- .1 





+ .1 
+ .1 






+ .1 

+ .2 
+ .3 
+ .3 
+ .3 
+ 5 
+ .5 
+ A 
+ .3 
+ .1 

- .1 

- .2 

- .3 

- .3 

- .3 

- .3 

- .2 

- .2 

- .1 


+ .2 


.01 

.01 
.04 
.04 
.01 
.01 
.01 

.01 
.04 
.09 
.16 
.25 
.25 
.16 
.09 
.09 
.09 
.01 

.01 
.04 
.04 
.04 
.01 
.01 

.01 

.01 
.01 

.01 
.04 
.09 
.09 
.09 
.25 
.25 
.16 
.09 
.01 
.01 
.04 
.09 
.16 
.09 
.09 
.04 
.04 
.01 

.04 


- 3. 


1 




2 




3 


+ 2.7 


4 


+ 5.2 


5 


+ 5.0 


6 


+ 2.4 


7 


+ 2.3 


8 


+ 2.2 


9 




10 


- 2.0 


11 


- 3.8 


12 


- 5.4 


13 


- 6.8 


14 


- 8.0 


15 


- 7.5 


16 


- 5.6 


17 


- 3.9 


18 


- 3.6 


19 


- 3.3 


20 


- 1.0 


21 




22 


+ -8 


23 


+ 1.4 


24 


+ 1.2 


25 


+ 1.0 


26 


+ -4 


27 


+ .3 


28 




29 




30 




31 


- 

- 

- 

- 
- 

- 
- 
- 

■ 


- 1 

- 2 
-3 

- 4 

- 5 

- 6 

- 7 
-8 

- 9 
-10 
-11 
-12 
-13 
-14 
-15 
1-16 
1-17 
rl8 
-19 
1-20 
r21 
1-22 
-23 
-24 
-25 
-26 
-27 
-28 
-29 
-30 


- .1 


32 




33 




34 


+ -4 


35 


+ -5 


36 




37 




38 




39 




40 


+ 1.0 


41 


+ 2.2 


42 


+ 3.6 


43... 


+ 3.9 


44 


+ 4.2 


45 


+ 7.5 


46 


+ 8.0 


47 


+ 6.8 


48 


+ ,5.4 


49 


+ 1.9 


50 


- 2.0 


51 


- 4.2 


52 


- 6.6 


53 


- 9.2 


54 


- 7.2 


55 


- 7.5 


56 

57 


- 5.2 

- 5.4 


58 


- 2.8 


59 




60 


+ 6.0 










18910 


-3.9 
+6.9 


3.24 


-25.1 








+3.0 





680 American Statistical Association. [116 

M'^30 M'„ = 2.6 

= 2.fi-4- 

61 



M a! = 30 M„ = 2.6 +|^=2.65 



2z=0 S?/=+.05 

2s 2 = 18910 St/ 2 =3.24 



■ v 61 * v 61 

= 17.6 =.22 

2xy=-25.1 

61 
r= 



17.6X.22 
.411 



3.87 
= -.106±E r 

l-(-.106) 2 



E r = .674 
= .674 



a/61 
1-.0112 



7.8 

=0.674X0.13 
r=-0.106±0.088 

When the relation is not linear the concomitant variation 
may be shown by the use of a "correlation ratio," which is 
simply a further development of the theory of correlation.* 

It is, however, not the purpose of this paper to consider 
relationships shown by curves of a higher order than a straight 
line, as such correlations involve more complicated mathemat- 
ical theory and also require many more observations to be 
significant. 

ADEQUACY OF THE COEFFICIENT OF CORRELATION. 

The conclusion seems legitimate that the coefficient of 
correlation may be used strictly as a measure of relationship, 

* See Pearson, K.: Mathematical Contributions to the Theory of Evolution: 14. On the general theory 
of skew correlation and non-linear regression. London, Drapers Company Research Memoirs. Bio* 
metric Series 2, 1905. Brown, W.: The Essentials of Mental Measurement. Cambridge, University 
Press, 1911, pp. 57-59. 



117] The Coefficient of Correlation. 681 

when such relationship has been determined by other investi- 
gation to follow straight line relations. The use of the coeffi- 
cient of correlation is to be recommended because it is 
independent of the personal equation of the investigator, and 
of the units employed, and because it shows rigidly the correct 
position of the line indicated by the dot chart. 

In using the coefficient of correlation it is desirable to cal- 
culate the probable error (see Tables I and III for method).* 
The probable error is that divergence from the observed mean 
on either side within which half the observations lie. Its 
size is a measure of how closely the results from an infinite 
number of cases would correspond with those obtained from 
the observed cases. When the coefficient of correlation is not 
greater than its probable error there is no evidence that there 
is any correlation; but when the coefficient of correlation is 
clearly greater than its probable error correlation is indicated; 
and when it is much greater (six times as great is an accepted 
empirical amount) it may be safely assumed that there is con- 
comitant variation. - ): 

The coefficient of correlation is obtained by applying the 
least square adjustment to all the material and is, therefore, 
the straight line of closest fit. If the relationship is not that 
of a straight line, it is obvious that the straight line of closest 
fit is not a good measure of the relationship and that some 
other measure (e. g., the correlation ratio) must be used. 
Therefore, the coefficient of correlation should never be used 
to show relationship until after the phenomena have been 
investigated, at least far enough to show whether a straight 
line satisfies the relationship as well as any other curve. 

LITERATURE. 

The development of the theory of correlation resulting in 
the adoption and use of the coefficient of correlation is, of 
course, largely mathematical. While the literature on the 
subject is considerable, the greater part of the contributions 
are concerned with the application of the coefficient to par- 

* For a general discussion of the significance of probable error see Yule, G. U. Introduction to the 
Theory of Statistics, ed. 2, London, Griffin & Co., 1912, pp. 310-311. 
t See Bowley, A. L.: Elements of Statistics, ed. 3, New York, Scribner, 1907, p. 320. 



682 American Statistical Association. [118 

ticular problems, and hence the development of the theory of 
correlation is incidental and widely scattered. 

" The fundamental theorems of correlation were for the first 
time and almost exhaustively discussed by A. Bravais* . . . 
[more than] half a century ago. He deals completely with the 
correlation of two and three variables. Forty years later Mr. 
J. D. Hamilton Dickson t dealt with a special problem proposed 
tf him by Mr. Galton, and reached on a somewhat narrow 
basis some of Bravais' results for correlation of two variables. 
Mr. Galton at the same time introduced an improved notation 
which may be summed up in the 'Galton function' or coeffi- 
cient of correlation. This indeed appears in Bravais' work, 
but a single symbol is not used for it. In 1892 Professor Edge- 
worth, also unconscious of Bravais' memoir, dealt in a paper 
on 'Correlated Averages' with correlation for three variables. J 
He obtained results identical with Bravais, although ex- 
pressed in terms of 'Galton's functions.' "§ 

The following publications contain complete statements of 
the later development: 

Pearson, Karl: Contributions to the mathematical theory of evolution; 
London, Royal Society, Philosophical Transactions, Series A, as 
follows: 

1. On the dissection of frequency curves, Vol. 185, 1894, pp. 71-110. 

2. Skew variations in homogeneous material, Vol. 186, 1895, pp. 343- 

414. 

3. Regression, heredity, and panmixia, Vol. 187, 1896, pp. 253-318. 

4. On the probable errors of frequency constants and on the influence 

of random selection on variation and correlation, Vol. 191, 1898, 
pp. 229-311. 

5. On the reconstruction of the stature of prehistoric races, Vol. 192, 

1898, pp. 169-244. 

6. Genetic (reproductive) selection; inheritance of fertility in man and 

of fecundity in thoroughbred race horses, Vol. 192, 1899, pp. 257- 
330. 

7. On the correlation of characters not quantitatively measureable, 

Vol. 195, 1900, pp. 1-47. 

* Analyse matheimtique sur les probabilities des erreurs de situation d'un -point. Paris, Academie des 
Sciences, Memoires presentes pir divers savinU. Series 2, Vol. 9, 1846, pp. 255-332. 

t Appendix to Galton, F.: Family Likeness in Stature. London, Royal Society, Proceedings, Vol. 40, 
1886, pp. 63-73. 

t London, Philosophical Magazine, Series 5, Vol. 34, 1892, pp. 190-204. 

I Pearson, Karl: London Royal Society Philosophical Transactions, Series A, Vol. 187, 1896, p. 261. 



119] The Coefficient of Correlation. 683 

8. On the inheritance of characters not quantitatively measureable, 

Vol. 195, 1900, pp. 75-150. 

9. On the principle of homotyposis and its relation to heredity, to the 

variability of the individual, and to that of the race, Vol. 197, 
1901, pp. 285-379. 

10. Supplement to a memoir on skew variation, Vol. 197, 1901, pp. 443- 

459. 

11. On the influence of natural selection on the variability and correla- 

tion of organs, Vol. 200, 1902, pp. 1-66. 

12. On a generalized theory of alternative inheritance with special 

reference to Mendel's Laws, Vol. 203, 1904, pp. 53-86. 

In London, Drapers' Company Research Memoirs, Biometric Series. 

13. On the theory of contingency and its relation to association and 

normal correlation. Memoir 1. 

14. On the general theory of skew correlation and non-linear regression. 

Memoir 2. 

15. On the mathematical theory of random migration. Memoir 3, 1906. 

16. On further methods of determining correlation. Memoir 4, 1907. 

17. [Not published.] 

18. On a novel method of regarding the association of two variates 

classed solely in alternate categories. Memoir 7, 1912. 

Pearson, Karl: On the partial correlation ratio. London, Royal Society, 

Proceedings, Series A, Vol. 91, 1915, pp. 492-498. 
Brown, W.: The essentials of mental measurement, Cambridge, Univer- 
sity Press, 1911. 
Elderton, W. P.: Frequency curves and correlation. London, Layton 

Brothers, 1906. 
Hooker, R. H.: Correlation of successive observations, Royal Statistical 

Society Journal, Vol. 68, pp. 676-703. 
Tolley, H. R. : The theory of correlation as applied to farm survey data 

on fattening baby beef, U. S. Department of Agriculture Bui. 504, 

Washington, Govt. Ptg. Office, 1917. 
Walker, Gilbert T.: Correlation in seasonal variation of weather, 

Indian Meteorological Department Memoirs, Simla, 1909-1915. 

1. Correlation in season variation of climate, Vol. 20, part 6, 1909, pp. 

117-124. 

2. (A) On the probable error of a coefficient of correlation with a group 

of factors. 
(B) Some applications of statistical methods to seasonal forecasting, 
Vol. 21, part 2, 1910, pp. 22h15. 

3. On the criterion for the reality of relationships or periodicities, Vol. 

21, part 9, 1914, pp. 13-16. 

4. Sunspots and rainfall, Vol. 21, part 10, 1915, pp. 17-60. 

5. Sunspots and temperature, Vol. 21, part 11, 1915, pp. 61-90. 

6. Sunspots and pressure, Vol. 21, part 12, 1915, pp. 91-118. 



684 American Statistical Association. [120 

Yule, G. Udny: Introduction to the theory of statistics, ed. 2, London, 
C. Griffin & Co., 1912, pp. 157-253. 

More elementary discussions are contained in the following 
papers: 

Persons, W. M.: The correlation of economic statistics. Boston, Ameri- 
can Statistical Association, Quarterly Publications, Vol. 12 (1910), 
pp. 287-322. 

Hooker, R. H. : An elementary explanation of correlation : illustrated by 
rainfall and the depth of water in a well; London, Royal Meteorological 
Society Quarterly Journal, Vol. 34, .1908, pp. 277-291. 

Elderton, W. P. and E. M.: Primer of statistics, London, A. and C. 
Black, 1910, pp. 55-72. 

King, W. I. : Elements of statistical method, New York, Macmillan, 1912, 
pp. 197-215. 

Dines, W. H. : The practical application of statistical methods to meteorol- 
ogy. London, H. M. Meteorological Office, The computer's handbook 
(M. 0. 223), section 5, part 2, 1915, pp. V29-V52. 

The most complete bibliographies will be found in: 

Yule, G. Udny: Introduction to the theory of statistics, London, C. 

Griffin & Co., 1912, pp. 188, 208-209, 225-226, and 252. 
Davenport, C. B. : Statistical methods with special reference to biological 

variation, third, revised edition, New York, J. Wiley & Sons, 1914, 

pp. 62 and 85-104. 



